47 research outputs found
GPU accelerated maximum cardinality matching algorithms for bipartite graphs
We design, implement, and evaluate GPU-based algorithms for the maximum
cardinality matching problem in bipartite graphs. Such algorithms have a
variety of applications in computer science, scientific computing,
bioinformatics, and other areas. To the best of our knowledge, ours is the
first study which focuses on GPU implementation of the maximum cardinality
matching algorithms. We compare the proposed algorithms with serial and
multicore implementations from the literature on a large set of real-life
problems where in majority of the cases one of our GPU-accelerated algorithms
is demonstrated to be faster than both the sequential and multicore
implementations.Comment: 14 pages, 5 figure
Combinatorial Problems in High-Performance Computing: Partitioning
This extended abstract presents a survey of combinatorial problems
encountered in scientific computations on today\u27s
high-performance architectures, with sophisticated memory
hierarchies, multiple levels of cache, and multiple processors
on chip as well as off-chip.
For parallelism, the most important problem is to partition
sparse matrices, graph, or hypergraphs into nearly equal-sized
parts while trying to reduce inter-processor communication.
Common approaches to such problems involve multilevel
methods based on coarsening and uncoarsening (hyper)graphs,
matching of similar vertices, searching for good separator sets
and good splittings, dynamical adjustment of load imbalance,
and two-dimensional matrix splitting methods
Parallelization of Mapping Algorithms for Next Generation Sequencing Applications
With the advent of next-generation high throughput sequencing
instruments, large volumes of short sequence data are generated at an
unprecedented rate. Processing and analyzing these massive data
requires overcoming several challenges. A particular challenge
addressed in this abstract is the mapping of short sequences (reads)
to a reference genome by allowing mismatches. This is a significantly
time consuming combinatorial problem in many applications including
whole-genome resequencing, targeted sequencing, transcriptome/small
RNA, DNA methylation and ChiP sequencing, and takes time on the order
of days using existing sequential techniques on large scale
datasets. In this work, we introduce six parallelization methods each
having different scalability characteristics to speedup short sequence
mapping. We also address an associated load balancing problem that
involves grouping nodes of a tree from different levels. This problem
arises due to a trade-off between computational cost and granularity
while partitioning the workload. We comparatively present the
proposed parallelization methods and give theoretical cost models for
each of them. Experimental results on real datasets demonstrate the
effectiveness of the methods and indicate that they are successful at
reducing the execution time from the order of days to under just a few
hours for large datasets.
To the best of our knowledge this is the first study on
parallelization of short sequence mapping problem
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
Finding dense substructures in a graph is a fundamental graph mining
operation, with applications in bioinformatics, social networks, and
visualization to name a few. Yet most standard formulations of this problem
(like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the
goal is rarely to find the "true optimum", but to identify many (if not all)
dense substructures, understand their distribution in the graph, and ideally
determine relationships among them. Current dense subgraph finding algorithms
usually optimize some objective, and only find a few such subgraphs without
providing any structural relations. We define the nucleus decomposition of a
graph, which represents the graph as a forest of nuclei. Each nucleus is a
subgraph where smaller cliques are present in many larger cliques. The forest
of nuclei is a hierarchy by containment, where the edge density increases as we
proceed towards leaf nuclei. Sibling nuclei can have limited intersections,
which enables discovering overlapping dense subgraphs. With the right
parameters, the nucleus decomposition generalizes the classic notions of
k-cores and k-truss decompositions. We give provably efficient algorithms for
nucleus decompositions, and empirically evaluate their behavior in a variety of
real graphs. The tree of nuclei consistently gives a global, hierarchical
snapshot of dense substructures, and outputs dense subgraphs of higher quality
than other state-of-the-art solutions. Our algorithm can process graphs with
tens of millions of edges in less than an hour
Graph manipulations for fast centrality computation
The betweenness and closeness metrics have always been intriguing and used in many analyses. Yet, they are expensive to compute. For that reason, making the betweenness and closeness centrality computations faster is an important and well-studied problem. In this work, we propose the framework, BADIOS, which manipulates the graph by compressing it and splitting into pieces so that the centrality computation can be handled independently for each piece. Although BADIOS is designed and fine-tuned for exact betweenness and closeness centrality, it can easily be adapted for approximate solutions as well. Experimental results show that the proposed techniques can be a great arsenal to reduce the centrality computation time for various types and sizes of networks. In particular, it reduces the betweenness centrality computation time of a 4.6 million edges graph from more than 5 days to less than 16 hours. For the same graph, we achieve to decrease the closeness computation time from more than 3 days to 6 hours (12.7x speedup)
Incremental closeness centrality in distributed memory
Networks are commonly used to model traffic patterns, social interactions, or web pages. The vertices in a network do not possess the same characteristics: some vertices are naturally more connected and some vertices can be more important. Closeness centrality (CC) is a global metric that quantifies how important is a given vertex in the network. When the network is dynamic and keeps changing, the relative importance of the vertices also changes. The best known algorithm to compute the CC scores makes it impractical to recompute them from scratch after each modification. In this paper, we propose Streamer, a distributed memory framework for incrementally maintaining the closeness centrality scores of a network upon changes. It leverages pipelined, replicated parallelism, and SpMM-based BFSs, and it takes NUMA effects into account. It makes maintaining the Closeness Centrality values of real-life networks with millions of interactions significantly faster and obtains almost linear speedups on a 64 nodes 8 threads/node cluster